\chapter{Atomic Operations}
\label{chap:atomic-ops}

The operations we have discussed until now have consisted of lone read or write
operations (though we did examine both synchronous and asynchronous versions of
both). While such lone reads and write are very common, some applications
require read-modify-write operations that execute atomically.

For instance, take an application that wants to count the number of times each
word appears as it crawls the web. There will undoubtedly be many concurrent
threads fetching web documents, checking the datastore for the current count of
each word, inserting a "1" if that word has not been encountered before, and
otherwise reading the count, incrementing it and storing the updated value.

It's easy to see that implementing such an application with lone reads and
writes would immediately lead to an incorrect application.  One possible
scenario is that two crawlers will encounter the word at approximately the same
time in two different documents. They will both check whether the word exists,
note that it is not currently in the data store, and both crawlers will store a
"1", whereas the correct count should be 2. Another possible scenario is that
one crawler will encounter a word, read the current count, say 17, and then get
delayed because of VM scheduling. In the meantime, a fast crawler goes through
millions of documents, updating the count for that particular word to, say,
124,232. When the slow crawler resumes, it will blithely write an "18" for that
word, losing more than 124,000 of the occurrences it should have been tracking.

Atomic read-modify-write operations provide a solution to this problem.  Such
operations are guaranteed to execute without interference from other operations.
Atomic operations ensure that the read-modify-write sequence that comprises an
operation is executed in a manner that cannot be interrupted by or interleaved
with any other operation. The entire block is one atomic unit.

Let's see how they're used. The setup below is identical to the previous
chapter, so if you have that space already set up, you can skip it.

\section{Setup}
\label{sec:atomic-ops:setup}

As in the previous chapters, the first step is to deploy the cluster and connect
a client.   First we launch and initialize the coordinator:

\begin{consolecode}
hyperdex coordinator -f -l 127.0.0.1 -p 1982
\end{consolecode}

Next, let's launch a few daemon processes to store data.  Execute the following
commands (note that each instance binds to a different port and has a different
\code{/path/to/data}):

\begin{consolecode}
hyperdex daemon -f --listen=127.0.0.1 --listen-port=2012 \
                   --coordinator=127.0.0.1 --coordinator-port=1982 --data=/path/to/data1
hyperdex daemon -f --listen=127.0.0.1 --listen-port=2013 \
                   --coordinator=127.0.0.1 --coordinator-port=1982 --data=/path/to/data2
hyperdex daemon -f --listen=127.0.0.1 --listen-port=2014 \
                   --coordinator=127.0.0.1 --coordinator-port=1982 --data=/path/to/data3
\end{consolecode}

We now have three different daemons ready to serve in the HyperDex cluster.
Finally, we create a space which makes use of all three systems in the cluster.
In this example, let's create a space that may be suitable for storing friend
lists in a social network:

\begin{pythoncode}
>>> import hyperdex.admin
>>> a = hyperdex.admin.Admin('127.0.0.1', 1982)
>>> a.add_space('''
... space friendlists
... key username
... attributes
...    string first,
...    string last,
...    set(string) friends
... ''')
True
>>> import hyperdex.client
>>> c = hyperdex.client.Client('127.0.0.1', 1982)
\end{pythoncode}

Finally, our object for John Smith and some others

\begin{pythoncode}
>>> c.put('friendlists', 'jsmith1', {'first': 'John', 'last': 'Smith',
...                                  'friends': set(['bjones1', 'jd', 'jj'])})
True
>>> c.put('friendlists', 'jd', {'first': 'John', 'last': 'Doe'})
True
>>> c.put('friendlists', 'bjones1', {'first': 'Brian', 'last': 'Jones'})
True
\end{pythoncode}

\section{Atomic Operations}
\label{sec:atomic-ops:ops}

HyperDex supports a few different types of atomic operations. Perhaps the most
general one is the \code{cond\_put}.  A \code{cond\_put} performs an \code{put}
if and only if the value being updated matches a condition specified along with
the new values to be inserted.

In our example, let's say the application wants to change John Smith's name to
Jon Smith only if his name is unchanged, but wants the application to fail if
the application has changed his name since it was last read:

\begin{pythoncode}
>>> c.get('friendlists', 'jsmith1')
{'first': 'John', 'last': 'Smith', 'friends': set(['bjones1', 'jd', 'jj'])}
>>> c.cond_put('friendlists', 'jsmith1',
...            {'first': 'John', 'last': 'Smith'},
...            {'first': 'Jon'})
True
>>> c.get('friendlists', 'jsmith1')
{'first': 'Jon', 'last': 'Smith', 'friends': set(['bjones1', 'jd', 'jj'])}
\end{pythoncode}

Here we told HyperDex to update John's name if and only if it is currently equal
to "John".  The third argument is a set of attributes that must match the object
for the update to succeed.  The fourth argument takes the same form as a typical
\code{put}.

Not surprisingly, this request succeeded, as John's name matched the specified
values. Let's try issuing the same operation again.

\begin{pythoncode}
>>> c.cond_put('friendlists', 'jsmith1',
...            {'first': 'John', 'last': 'Smith'},
...            {'first': 'Jon'})
False
\end{pythoncode}

Notice that \code{cond\_put} failed because the value of the first name field is
no longer "John".

Note that the last argument has the same generality as the arguments to a
regular \code{put} operation. So there is no requirement that a \code{cond\_put}
check and update the same field. The following is a perfectly legitimate
operation that updates the first name field of Jon's profile if and only if his
set of friends has not changed:

\begin{pythoncode}
>>> c.cond_put('friendlists', 'jsmith1',
...            {'friends': set(['bjones1', 'jd', 'jj'])},
...            {'first': 'John'})
True
>>> c.get('friendlists', 'jsmith1')
{'first': 'John', 'last': 'Smith', 'friends': set(['bjones1', 'jd', 'jj'])}
\end{pythoncode}

The great thing about HyperDex is that \code{cond\_put} operations are fast.  In
fact, their performance is indistinguishable from a normal \code{put}, all else
being equal.  Thus, you can rely heavily upon \code{cond\_put} operations to
avoid race conditions without sacrificing performance.  Going a step further,
it's possible to use \code{async\_cond\_put} operations to achieve even more
concurrency without losing correctness.

Keep in mind that \code{cond\_put} operations can and will fail, as intended, if
there are interceding operations that update the object fields that must match.
In these cases, the client will typically want to re-fetch the object,
re-perform its updates, and re-submit the conditional operation.

Another useful atomic primitive HyperDex provides is the
\code{put\_if\_not\_exist} operation.  This operation succeeds if and only if
the object does not already exist.  This can be useful to implement locking
behavior.  For example, a simple lock space can be used to provide per-user
locking:

\begin{pythoncode}
>>> a.add_space('''
... space userlocks
... key username
... ''')
True
\end{pythoncode}

With this space, we can atomically check if an object exists, creating it in the
process; or fail the operation, leaving the object unchanged.

\begin{pythoncode}
>>> c.put_if_not_exist('userlocks', 'jsmith1', {})
True
>>> c.get('userlocks', 'jsmith1')
{}
>>> c.put_if_not_exist('userlocks', 'jsmith1', {})
False
>>> c.delete('userlocks', 'jsmith1')
True
\end{pythoncode}

Here's an illustrative example that shows how a test-and-set mechanism can be
used to implement \code{lock()} and \code{unlock()}.

\begin{pythoncode}
>>> def lock(client, user):
...     while not client.put_if_not_exist('userlocks', user, {}):
...         pass
>>> def unlock(client, user):
...     client.delete('userlocks', user)
>>> lock(c, 'jsmith1')
>>> unlock(c, 'jsmith1')
\end{pythoncode}

Note that a real implementation will not want to busy-loop, and will want to
make provisions for when clients fail while holding the lock.

\section{A Comprehensive Walk}
\label{sec:atomic-ops:walk}

Having built an intuition for how to structure and use the atomic operations,
let's go through them and illustrate the various atomic operations for each of
the different types. So, let's first create a space that can allow us to do
this:

\begin{pythoncode}
>>> a.add_space('''
... space alldatatypes
... key k
... attributes
...    string s,
...    int i,
...    float f,
...    list(string) ls,
...    set(string) ss,
...    map(string, string) mss,
...    map(string, int) msi''')
True
\end{pythoncode}

The key-based operations \code{put\_if\_not\_exist} and \code{cond\_put} can be
used to create the object if it does not exist, and to modify it only if certain
fields match expected values, respectively.

\begin{pythoncode}
>>> c.put_if_not_exist('alldatatypes', 'somekey', {'s': 'initial value'})
True
>>> c.put_if_not_exist('alldatatypes', 'somekey', {'s': 'initial value'})
False

>>> # cond_put.  First is predicate.  May be any valid search predicate
>>> c.cond_put('alldatatypes', 'somekey', {'s': 'initial value'}, {'s': 'some string'})
True
>>> c.cond_put('alldatatypes', 'somekey', {'s': 'initial value'}, {'s': 'some string'})
False
\end{pythoncode}

Note how the first operations succeeds, and the second one fails. Let's check to
make sure that our object reflects the changes we have applied:

\begin{pythoncode}
>>> c.get('alldatatypes', 'somekey')
{'f': 0.0, 'i': 0, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}
\end{pythoncode}

The operation \code{cond\_put\_or\_create} is a combination of \code{put\_if\_not\_exist} and \code{cond\_put}.  If the object does not exist, it will create the object, but if the object exists it will be modified only if certain fields match expected values.

\begin{pythoncode}
>>> c.cond_put_or_create('alldatatypes', 'anotherkey', {'s': 'a'}, {'s': 'b'})
True
>>> c.cond_put_or_create('alldatatypes', 'anotherkey', {'s': 'a'}, {'s': 'b'})
False
>>> c.get('alldatatypes', 'anotherkey')
{'f': 0.0, 'i': 0, 'mss': {}, 'ss': set([]), 's': 'b', 'ls': [], 'msi': {}}
>>> c.cond_put_or_create('alldatatypes', 'anotherkey', {'s': 'b'}, {'s': 'a'})
True
>>> c.get('alldatatypes', 'anotherkey')
{'f': 0.0, 'i': 0, 'mss': {}, 'ss': set([]), 's': 'a', 'ls': [], 'msi': {}}
\end{pythoncode}

Let's now perform some atomic operations on integers and floats. These are
self-explanatory, so we'll let the code do the talking. You will note that the
float "f" and integer "i" fields are the ones of interest here, the rest are
non-changing:

\begin{pythoncode}
>>> c.atomic_add('alldatatypes', 'somekey', {'i': 1, 'f': 0.25})
True
>>> c.get('alldatatypes', 'somekey')
{'f': 0.25, 'i': 1, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}

>>> c.atomic_sub('alldatatypes', 'somekey', {'i': 2, 'f': 0.5})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': -1, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}

>>> c.atomic_mul('alldatatypes', 'somekey', {'i': 2, 'f': 4.})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -1.0, 'i': -2, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}

>>> c.atomic_div('alldatatypes', 'somekey', {'i': 2, 'f': 4.})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': -1, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}

>>> c.put('alldatatypes', 'somekey', {'i': 0xdeadbeefcafe})
True
>>> c.atomic_and('alldatatypes', 'somekey', {'i': 0xffffffff0000})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 244837814042624, 'mss': {}, 'ss': set([]), 's': 'some string', 'ls': [], 'msi': {}}
>>> print "0x%x" % (c.get('alldatatypes', 'somekey')['i'],)
0xdeadbeef0000

>>> c.atomic_or('alldatatypes', 'somekey', {'i': 0x00000000cafe})
True
>>> print "0x%x" % (c.get('alldatatypes', 'somekey')['i'],)
0xdeadbeefcafe

>>> c.atomic_xor('alldatatypes', 'somekey', {'i': 0xdea5a0403af3})
True
>>> print "0x%x" % (c.get('alldatatypes', 'somekey')['i'],)
0x81eaff00d
\end{pythoncode}

Ok, now let's perform some atomic operations on strings:

\begin{pythoncode}
>>> c.string_prepend('alldatatypes', 'somekey', {'s': '->'})
True
>>> c.get('alldatatypes', 'somekey')['s']
'->some string'

>>> c.string_append('alldatatypes', 'somekey', {'s': '<-'})
True
>>> c.get('alldatatypes', 'somekey')['s']
'->some string<-'
\end{pythoncode}

Atomic operations on lists allow the client to push new items on the left or
right of the list:

\begin{pythoncode}
>>> c.put('alldatatypes', 'somekey', {'ls': ['B']})
True
>>> c.list_lpush('alldatatypes', 'somekey', {'ls': 'A'})
True
>>> c.get('alldatatypes', 'somekey')['ls']
['A', 'B']

>>> c.list_rpush('alldatatypes', 'somekey', {'ls': 'C'})
True
>>> c.get('alldatatypes', 'somekey')['ls']
['A', 'B', 'C']
\end{pythoncode}

Sets provide the whole range of atomic set operations:

\begin{pythoncode}
>>> c.set_add('alldatatypes', 'somekey', {'ss': 'C'})
True
>>> c.get('alldatatypes', 'somekey')['ss']
set(['C'])

>>> c.set_remove('alldatatypes', 'somekey', {'ss': 'C'})
True
>>> c.get('alldatatypes', 'somekey')['ss']
set([])

>>> c.set_union('alldatatypes', 'somekey', {'ss': set(['A', 'B', 'C'])})
True
>>> c.get('alldatatypes', 'somekey')['ss']
set(['A', 'C', 'B'])

>>> c.set_intersect('alldatatypes', 'somekey', {'ss': set(['A', 'B', 'Z'])})
True
>>> c.get('alldatatypes', 'somekey')['ss']
set(['A', 'B'])
\end{pythoncode}

Finally, we have maps. Maps provide two kinds of atomic operations: the kind
that atomically manipulates the map itself, and the kind that atomically
manipulates an element of the map.

Let's atomically add an element to the \code{mss} field while atomically adding
another element, with the same key, to the \code{msi} field.

\begin{pythoncode}
>>> c.map_add('alldatatypes', 'somekey', {'mss': {'mapkey': 'mapvalue'}, 'msi': {'mapkey': 16}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 16}}
\end{pythoncode}

Let's add another field to one of the maps:

\begin{pythoncode}
>>> c.map_add('alldatatypes', 'somekey', {'mss': {'tmp': 'delete me'}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'tmp': 'delete me', 'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 16}}
\end{pythoncode}

Let's now atomically delete that field. We need only specify its key, the value
does not matter for the \code{map\_remove} operation.

\begin{pythoncode}
>>> c.map_remove('alldatatypes', 'somekey', {'mss': 'tmp'})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 16}}
\end{pythoncode}

Now we can perform all of the preceding atomic operations on each of the
elements of the maps, atomically:

\begin{pythoncode}
>>> c.map_atomic_add('alldatatypes', 'somekey', {'msi': {'mapkey': 16}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 32}}

>>> c.map_atomic_sub('alldatatypes', 'somekey', {'msi': {'mapkey': -32}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 64}}

>>> c.map_atomic_mul('alldatatypes', 'somekey', {'msi': {'mapkey': 4}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 256}}

>>> c.map_atomic_div('alldatatypes', 'somekey', {'msi': {'mapkey': 64}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 4}}

>>> c.map_atomic_and('alldatatypes', 'somekey', {'msi': {'mapkey': 2}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 0}}

>>> c.map_atomic_or('alldatatypes', 'somekey', {'msi': {'mapkey': 5}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 5}}

>>> c.map_atomic_xor('alldatatypes', 'somekey', {'msi': {'mapkey': 7}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': 'mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 2}}

>>> c.map_string_prepend('alldatatypes', 'somekey', {'mss': {'mapkey': '->'}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': '->mapvalue'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 2}}

>>> c.map_string_append('alldatatypes', 'somekey', {'mss': {'mapkey': '<-'}})
True
>>> c.get('alldatatypes', 'somekey')
{'f': -0.25, 'i': 34874585101, 'mss': {'mapkey': '->mapvalue<-'}, 'ss': set(['A', 'B']), 's': '->some string<-', 'ls': ['A', 'B', 'C'], 'msi': {'mapkey': 2}}
\end{pythoncode}

\subsection{Atomic Operations on Documents}
Note that Hyperdex also supports atomic operations on documents. To do this, we
first have to create a space that has a field of the type document.

\begin{pythoncode}
>>> a.add_space('''
... space people
... key k
... attributes
...    document info''')
True
\end{pythoncode}

Now, we can insert data into this table.

\begin{pythoncode}
>>> Document = hyperdex.client.Document
>>> c.put('people', 'jane', {'info' : Document( {'name': 'Jane Doe', 'gender' : 'female', 'age' : 21, 'likes' : ['cornell', 'python']} )})
True
\end{pythoncode}

We might now want to modify this information of this person without rewriting
the whole document. For example, it might be Jane Doe's birthday and we want to
increment the age.  Fortunately, atomic operations also work on this kind of
data.  HyperDex is able to increment integers within documents by providing the
full path to the integer within the document.

\begin{pythoncode}
>>> c.atomic_add('people', 'jane', {'info.age' : 1})
True
>>> c.get('people', 'jane')
{'info': Document({"name": "Jane Doe", "gender": "female", "age": 22, "likes": ["cornell", "python"]})}
\end{pythoncode}

HyperDex will also prevent you from adding to non-numeric fields.

\begin{pythoncode}
>>> c.atomic_add('people', 'jane', {'info.gender' : 1})
False
\end{pythoncode}

You can also add new entries to the document using this method.

\begin{pythoncode}
>>> c.atomic_add('people', 'jane', {'info.children' : 1})
True
>>> c.get('people', 'jane')
{'info': Document({"name": "Jane Doe", "gender": "female", "age": 22, "children": 1, "likes": ["cornell", "python"]})}
\end{pythoncode}

Like for maps, there are other possible atomic operations. You might want to modify the name by prepending a title.

\begin{pythoncode}
>>> c.string_prepend('people', 'jane', {'info.name' : 'Dr. '})
True
>>> c.get('people', 'jane')
{'info': Document({"name": "Dr. Jane Doe", "gender": "female", "age": 22, "children": 1, "likes": ["cornell", "python"]})}
\end{pythoncode}

Similarly, we can append text. 

\begin{pythoncode}
>>> c.string_append('people', 'jane', {'info.name' : ', Jr.'})
True
>>> c.get('people', 'jane')
{'info': Document({"name": "Dr. Jane Doe, Jr.", "gender": "female", "age": 22, "children": 1, "likes": ["cornell", "python"]})}
\end{pythoncode}



HyperDex's atomic operations are extensive and very expressive.  They are
guaranteed to be applied in the same order on all replicas, so the state of the
object you are operating on is guaranteed to be the same, regardless of
failovers.
